Otvoreni podaci i projektno sufinansiranje medijskih sadržaja

Tijana Blagojev - R-Ladies Belgrade

Početak

Projekat Otvorenim podacima do kvalitetnijeg projektnog sufinansiranja medijskih sadržaja

Interesantni segmenti web aplikacije

Izazovi

  • Velika količina neuniformnih rešenja u pdf formatu

  • Nazivi podnosilaca projekata koji su zavedeni pod različitim imenima u rešenjima

  • Nepostojanje informacija u rešenjima o medijima u kojima će se projekat realizovati

  • Pretraživost APR je zaista limitirana, kvalitet unetih podataka za mašinsko čitanje takođe

  • Ponekad progutana slova, nepotrebni razmaci (spas u Open Refine-u)

  • Manjak vremena da se sve još jednom proveri :)

Šta smo naučili

  • Postaviti što jednostavniju metodologiju za prikupljanje podataka

  • Ukoliko istraživači naiđu na nešto što je u rešenjima što nije pomenuto u metodologiji obavezno da pitaju šta da rade

  • Pomoć za izazove APR delom rešena uz pomoć alata koji omogućava lakše pretraživanje.

  • Objasniti istraživačima zašto je važno da podaci budu ujednačeni

Glavni podaci

  • Dataset ima ukupno 11,677 unosa i 10 kolona
  • Obuhvatio je 154 davaoca sredstava: Pokrajinski sekretarijat, Ministarstvo kulture i informisanja i 152 lokalne samouprave
  • Na osnovu naziva projekta je opredeljeno 9 tematskih celina a ukupan iznos sredstava u evrima je 70,821,309
## Rows: 11,677
## Columns: 10
## $ `ORGAN KOJI RASPISUJE KONKURS/OPŠTINA` <chr> "Ada", "Ada", "Ada", "Ada", "A…
## $ `MATIČNI BROJ GRADA/OPŠTINE`           <dbl> 80012, 80012, 80012, 80012, 80…
## $ GODINA                                 <dbl> 2015, 2016, 2017, 2018, 2019, …
## $ `PODNOSILAC PROJEKTA`                  <chr> NA, NA, NA, "PANONIJA MEDIA DO…
## $ `MATIČNI BROJ PODNOSIOCA`              <chr> NA, NA, NA, "21346365", "21443…
## $ `NAZIV MEDIJA`                         <chr> NA, NA, NA, "Produkcija", "Pro…
## $ `NAZIV PROJEKTA`                       <chr> "Sredstva nisu dodeljena", "Sr…
## $ `TEMA PROJEKTA`                        <chr> NA, NA, NA, "Informativni prog…
## $ `SREDSTVA U DINARIMA`                  <dbl> 0, 0, 0, 900000, 500000, 20000…
## $ `SREDSTVA U EVRIMA`                    <dbl> 0, 0, 0, 7610, 4243, 1698, 458…

Kolone koje su popunjavali istraživači

Tabela sa podacima za jednu opštinu

Finalna tabela sa svim kolonama

Izgled tabele sa podacima za jednu opštinu

Exercise

  • Go to this link to download folder with this excercise: https://github.com/tixwitchy/Dogs-of-New-York

  • Click on Clone or download green button and Download ZIP

  • Open DogsofNewYork.Rproj file

  • In the left part of R studio (Console) copy the following code and press enter:

install.packages(c("flexdashboard", "tidyverse", "plotly", "DT"))

First steps

  • After installing R and R studio you need to set a working directory where all your work will be stored.

  • The best way to do this is to choose File/New Project which will automatically store all your information in same place.

  • As we already opened DogsofNewYork.Rproj file it has already set the working directory for us.

R Interface

Packages and Libraries

When you install R, you have basic functions already available within Base R. You can take a look at Introduction to Base R for additional information.

However, in order to access functions or data written by other people there are numerious R packages available.

An R package is a bundle of functions (code), data, documentation, vignettes (examples).

Important note - R is case-sensitive so make sure to check spelling and capitalization!

Packages and Libraries-Code

To access information in R packages they first need to be installed and then accessed through their libraries. Use the following code to install packages and load libraries.

install.packages("flexdashboard")

install.packages("tidyverse")

install.packages("plotly")

install.packages("DT")
library (flexdashboard)

library (tidyverse)

library (plotly)

Simple use of R

Type in your console the following command and press enter.

2 + 2
## [1] 4

You use <- to create objects in R. It is called an assignement operator.

x <- 5
y <- 10
z <- x + y
z
## [1] 15

Dataset

The data set on dog bites is taken from R package nycdogs by Kieran Healy. For our exercise it is adapted only to include year 2017 and several variables. So let us see how the dataset looks like.

Important note: You will rarely come accross the dataset that is already prepared for analysis. Usually, you will spend between 50% - 80% of your time on cleaning and preparing data.

Importing a dataset

First, we will import and inspect a csv file about dog bites in New York City for 2017 with the following code.

Word of Caution in this Tale

“We infer that something we see in the data applies beyond the time, place and conditions in which it happened to surface.”

— Ben Jones, Avoiding Data Pitfalls

Word of Caution in this Tale

  • In order to say that Pit Bulls are really agressive we need to do additional research.

  • Is it relevant to make conclusions with this number of observations? Is the data reliable?

  • That is why experts need to be able to create this type of visualisations. They already have expertise needed to draw valid conclusion and this tool can help them reach wider audience as well as follow and contribute to other people’s work.

Example of combining Flexdashboards and Shiny

Where to publish your dashboard

Additional resources

Great Work and Thank you!